Skip to content

C++/Python: Better syntax for false positives and negatives in inline expectations#4571

Merged
jbj merged 14 commits into
github:mainfrom
MathiasVP:better-syntax-for-false-positives-and-negatives-inline-expectation
Nov 3, 2020
Merged

C++/Python: Better syntax for false positives and negatives in inline expectations#4571
jbj merged 14 commits into
github:mainfrom
MathiasVP:better-syntax-for-false-positives-and-negatives-inline-expectation

Conversation

@MathiasVP

@MathiasVP MathiasVP commented Oct 28, 2020

Copy link
Copy Markdown
Contributor

Fixes https://github.com/github/codeql-c-analysis-team/issues/75.

The syntax for annotating inline expectations is now:

// $ tag1,tag2=x MISSING: tag3=y SPURIOUS: tag4=z

Any of the "columns" in the comment can be left out, so we don't have to yell all the time:

// $ tag1,tag2=x

I had to replace some of the regex capture-group techniques with manual parsing as my regex skills didn't manage to do optional regex capture groups (Is that even possible?).

Commit 4be02a9 shows the new syntax in the C++ field-flow tests. If we like what that looks like I'll update the remaining tests.

(I assume Actions will complain a lot about the remaining test diffs until then)

By request from @dbartol, we should wait with merging this until #4432 is merged.

@MathiasVP MathiasVP requested a review from dbartol October 28, 2020 21:35
@MathiasVP MathiasVP requested a review from a team as a code owner October 28, 2020 21:35
@MathiasVP MathiasVP changed the title C++/Python: Better syntax for false positives and negatives inline expectation C++/Python: Better syntax for false positives and negatives in inline expectations Oct 28, 2020
@MathiasVP MathiasVP requested a review from a team October 29, 2020 16:34

@geoffw0 geoffw0 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me.

There's a failing test:

FAILED: /home/runner/work/semmle-code/semmle-code/ql/cpp/ql/test/library-tests/ir/points_to/points_to.ql

and as noted in the meeting today you need to mirror the changes in python/ql/test/TestUtilities/InlineExpectationsTest.qll (and update any affected test files)

Comment thread cpp/ql/test/library-tests/dataflow/fields/A.cpp
@MathiasVP

Copy link
Copy Markdown
Contributor Author

👍 from me.

There's a failing test:

FAILED: /home/runner/work/semmle-code/semmle-code/ql/cpp/ql/test/library-tests/ir/points_to/points_to.ql

and as noted in the meeting today you need to mirror the changes in python/ql/test/TestUtilities/InlineExpectationsTest.qll (and update any affected test files)

Yep, I've not yet updated the Python tests, nor the IR tests. I'll run the sync and update the Python version when the InlineExpectationsTest.qll file has been reviewed (which I guess I can do now that you have 👍 'd it from the C++ side).

@MathiasVP

MathiasVP commented Oct 29, 2020

Copy link
Copy Markdown
Contributor Author

Turns out there's a problem parsing the following expectation comment in Python:

$getCommand="cmd1; cmd2"

this worked previously because each expectation annotation was separated with a $, i.e.;

$tag1="a b c" $tag2,tag3=x

meant that we expected tag1="a b c" and tag2,tag3=x.

Previously we did this by splitting the string on occurrences of $s, but now that we've removed the $ that's no longer easy to do. Here's a couple of solutions:

  1. Complicate Extend the parsing of expectations to account for string quotes and avoid splitting the string when we're inside a string
  2. Separate tags using a different symbol (maybe a comma?)

I'm personally in favor of doing option 1 and just see how well it performs. Any thoughts?

@jbj

jbj commented Oct 30, 2020

Copy link
Copy Markdown
Contributor

Option 1 sounds good to me. Anything that can be mostly contained within the regex engine should perform well.

@MathiasVP

MathiasVP commented Oct 30, 2020

Copy link
Copy Markdown
Contributor Author
  • Complicate Extend the parsing of expectations to account for string quotes and avoid splitting the string when we're inside a string

Done in ee77e98.

@MathiasVP MathiasVP force-pushed the better-syntax-for-false-positives-and-negatives-inline-expectation branch from 78b96b7 to 0fa3aff Compare October 30, 2020 14:33
@MathiasVP MathiasVP force-pushed the better-syntax-for-false-positives-and-negatives-inline-expectation branch from 0fa3aff to 45b24a9 Compare October 30, 2020 15:56
@MathiasVP

MathiasVP commented Oct 31, 2020

Copy link
Copy Markdown
Contributor Author

Oh, you Python people are really pushing the inline expectation tests with tests like:

responseBody='<img src="0" onerror="alert(1)">'

(don't get me wrong, I love how you push the syntax forward!).

I guess another round of syntax relaxations are in order: Both double quotes (") and single quotes (') should start a string, and the character that's not used to quote the string can be used inside the string. I hope we'll never get to a point where we need to escape string quotes. 🙏

Edit: Done in 870ed00.

jbj
jbj previously approved these changes Nov 2, 2020

@jbj jbj left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Thanks for pushing through on this, Mathias.

I'll merge this PR tomorrow morning unless anyone (@github/codeql-python) objects.

@jbj

jbj commented Nov 2, 2020

Copy link
Copy Markdown
Contributor

@yoff

yoff commented Nov 2, 2020

Copy link
Copy Markdown
Contributor

We also have this:

return HttpResponseNotFound(request.GET.get("name")) # $HttpResponse $mimetype=text/html; charset=utf-8 $responseBody=Attribute()

where mimetype has the value text/html; charset=utf-8

@jbj

jbj commented Nov 2, 2020

Copy link
Copy Markdown
Contributor

We also have this:

return HttpResponseNotFound(request.GET.get("name")) # $HttpResponse $mimetype=text/html; charset=utf-8 $responseBody=Attribute()

where mimetype has the value text/html; charset=utf-8

And that has been updated correctly in this PR, right? https://github.com/github/codeql/pull/4571/files#diff-e193c8f3abbe30c8120fb21c82420636aa5aa0e001c2ee246eace1f575429ac1R24-R25

…alse-positives-and-negatives-inline-expectation

Required fixing up semantic conflicts in tests.

Conflicts:
	python/ql/test/experimental/library-tests/frameworks/stdlib/Decoding.py
@yoff

yoff commented Nov 3, 2020

Copy link
Copy Markdown
Contributor

And that has been updated correctly in this PR, right?

Yes, if we can write it like that, that seems fine. I guess it is up to the implementations of InlineTest to decide if string values should always be rendered with quotes. (A few lines above, the value of mimetype is without quotes.)

@yoff yoff left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
(basically, I am happy if the tests pass, I do not think the annotation have been mangled here.)

@jbj jbj merged commit 76fd710 into github:main Nov 3, 2020
@jbj

jbj commented Nov 3, 2020

Copy link
Copy Markdown
Contributor

Yes, if we can write it like that, that seems fine. I guess it is up to the implementations of InlineTest to decide if string values should always be rendered with quotes. (A few lines above, the value of mimetype is without quotes.)

I'm not sure the quote handling is exactly what we want it to be in the long run, but at least the test annotations we have now look good.

@RasmusWL

RasmusWL commented Nov 3, 2020

Copy link
Copy Markdown
Member

Sorry for being late to the party. Blame me for using the string contents as the value for the tag @MathiasVP 😬 it seemed like a neat solution, and would avoid us rewriting test-code to be more verbose such as

-foo("bar") # $fooarg="bar"
+s = "bar"
+foo(s) # $fooarg=s

When I encountered the first string that had a space in it I nervously ran the test, but it all worked, so I moved happily on my way without considering if I had done something terrible 😅

I guess another round of syntax relaxations are in order: Both double quotes (") and single quotes (') should start a string, and the character that's not used to quote the string can be used inside the string. I hope we'll never get to a point where we need to escape string quotes.

Good luck handling print("""""foo"\""""), which prints ""foo""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants